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Abstract 

When two or more self-interested agents put their plans 
to execution in the same environment, conflicts may 
arise as a consequence, for instance, of a common uti¬ 
lization of resources. In this case, an agent can post¬ 
pone the execution of a particular action, if this punc¬ 
tually solves the conflict, or it can resort to execute a 
different plan if the agent’s payoff significantly dimin¬ 
ishes due to the action deferral. In this paper, we present 
a game-theoretic approach to non-cooperative planning 
that helps predict before execution what plan schedules 
agents will adopt so that the set of strategies of all agents 
constitute a Nash equilibrium. We perform some ex¬ 
periments and discuss the solutions obtained with our 
game-theoretical approach, analyzing how the conflicts 
between the plans determine the strategic behavior of 
the agents. 


Introduction 


Multi-agent Planning (MAP) with self-interested agents is 
the problem of coordinating a group of agents that com¬ 
pete to make their strategic behavior prevail over the oth¬ 
ers’: agents competing for a particular goal or the utiliza¬ 
tion of a common resource, agents competing to maximize 
their benefit or agents willing to form coalitions with others 
in order to achieve better their own goals or preferences. In 
this paper, we focus on game-theoretic MAP approaches for 
self-interested agents. 

Brafman et al ( jBrafman et al. 2009[ ) introduce the 
Coalition-Planning Game (CoPG), a game-theoretic ap¬ 
proach for self-interested agents which have personal goals 
and costs but may find it beneficial to cooperate with each 
other provided that the coalition formation helps increase 
their personal net benefit. In particular, authors propose a 
theoretical framework for stable planning in acyclic CoPG 
which is limited to one goal per agent. Following the line 
of CoPG, the work in (Crosby and Rovatsos 2011) presents 
an approach that combines heuristic calculations in existing 
planners for solving a restricted subset of CoPGs. In gen¬ 
eral, there has been a rather intensive research on coopera¬ 
tive self-interest agents as, for example, for modeling the be¬ 
havior of planning agents in groups ([Hadad et al. 2013 ) and 
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in coalitional resource game scenarios ( Dunne et al. 2010} >, 
among others. 

On the other hand, game-theoretic non-cooperative MAP 
approaches aim, in general, at finding a Nash Equilib¬ 
rium joint plan out of the individual plans of the agents. 
’Pure’ game-theoretic approaches, like (Bowling, Jensen, 


|and Veloso 2003| > and ( |Larbi, Konieczny, and Marquis 2007) 
perform a strategic analysis of all possible agent plans 
and define notions of equilibria by analyzing the relation¬ 
ships between different solutions in game-theoretic terms. 
In (Bowling, Jensen, and Veloso 2 003| >, MAP solutions are 
classified according to the agents’ possibility of reaching 
their goals and the paths of execution (combinations of local 


plans). Similarly, satisfaction profiles in ( Larbi, Konieczny, 
and Marquis 2007) are defined by the level of assurance of 


reaching the agent’s goals. A different approach using best- 
response was proposed to solve congestion games and to 
perform plan improvement in general MAP scenarios from 
an available initial joint plan (Jonsson and Rovatsos 2011}. 

Game-theoretic approaches that evaluate every strategy of 
every agent against all other strategies are ineffective for 
planning, since even if plan length is bounded polynomially. 


the number of available strategies is exponential (Nissim and 


Braf man 2013| >. However, in environments where coopera¬ 
tion is not allowed or calculating an initial joint plan is not 
possible, game-theoretic approaches are useful. Take, for in¬ 
stance, the modeling of a transportation network, sending 
packets through the Internet or a network traffic, where in¬ 
dividuals need to evaluate routes in the presence of the con¬ 
gestion resulting from the decisions made by themselves and 
everyone else ([Easley and Kleinberg 2010|. In this sense, we 
argue that game-theoretic reasoning is a valid approach for 
this specific type of planning problems, among others. 

In this paper, we present a novel game-theoretic non- 
cooperative model to MAP with self-interested agents that 
solves the following problem. We consider a group of agents 
where each agent has one or several plans that achieve one 
or more goals. Executing a particular plan reports a bene¬ 
fit to the agent depending on the number of goals achieved, 
makespan of the plan or cost of the actions. Agents oper¬ 
ate in a common environment, what may provoke interac¬ 
tions between the agents’ plans and thus preventing a con¬ 
current execution. Each agent is willing to execute the plan 
that maximizes its benefit but it ignores which plan the other 


































agents will point out, how his plan will be interleaved with 
theirs and the impact of such coordination on his benefit. 

We present a two-game proposal to tackle this problem. 
A general game in which agents take a strategic decision 
on which joint plan to execute, and an internal game that, 
given one plan per agent, returns an equilibrium joint plan 
schedule. Agents play the internal game to simulate the si¬ 
multaneous execution of their plans, find out the possibil¬ 
ities to coordinate in case of interactions and the effect of 
such coordination on their final benefit. The approach of 
the general game is very similar to the work described in 
( |Larbi, Konieczny, and Marquis 2007) ; specifically, our pro¬ 
posal contributes with several novelties: 

• Introduction of soft goals to account for the case in which 
a joint plan that achieves all the goals of every agent is 
not feasible due to the interactions between the agents’ 
plans. The aim of the general game is precisely to select 
an equilibrium joint plan that encompasses the ’best’ plan 
of each agent. 

• An explicit handling of conflicts between actions and a 
mechanism for updating the plan benefit based on the 
penalty derived from the conflict repair. This is precisely 
the objective of the internal game and the key contribution 
that makes our model a more realistic approach to MAP 
with self-interested agents. 

• An implementation of the theoretical framework, us¬ 
ing the Gambit tool ( |McKelvey, McLennan, and Turocy) 
2014) for solving the general game and our own program 
for the internal game. 

We wish to highlight that the model presented in this pa¬ 
per is not intended to solve a complete planning problem due 
to the exponential complexity inherent to game-theoretic ap¬ 
proaches. The model is aimed at solving a specific situation 
where the alternative plans of the agents are particularly lim¬ 
ited to such situation and thus plans would be of a relatively 
similar and small size. 

The paper is organized as follows. The next section pro¬ 
vides an overview of the problem, introduces the notation 
that we will use throughout the paper and describes the gen¬ 
eral game in detail. The following section is devoted to the 
specification of the internal game, which we call the joint 
plan schedule game. Section ’Experimental results’ shows 
some experiments carried out with our model and last sec¬ 
tion concludes. 

Problem Specification 

The problem we want to solve is specified as follows. There 
is a set of n rational, self-interested agents N = {1,..., n} 
where each agent i has a collection of independent plans II, 
that accomplish one or several goals. Executing a particular 
plan 7r provides the owner agent a real-valued benefit given 
by the function /3 : II —» R. The benefit that agent i obtains 
from plan n is denoted by 7r); in this work, we make this 
value dependent on the number of goals achieved by n and 
the makespan of n but different measures of reward and cost 
might be used, like the relevance of the achieved goals to 
agent i or the cost of the actions of n. Each agent i wishes 


to execute a plan n such that maxfplfiTT)) y tt £ H; how¬ 
ever, since agents have to execute their plans simultaneously 
in a common environment, conflicts may arise that prevent 
agents from executing their preferable plans. Let’s assume 
that 7r and n' are the maximum benefit plans of agents i and 
j, respectively, and that the simultaneous execution of it and 
7 r' is not possible due to a conflict between the two plans. If 
this happens, several options are analyzed: 

• agent i (agent j, respectively) considers to adapt the ex¬ 
ecution of its plan ir (tt', respectively) to the plan of the 
other agent by, for instance, delaying the execution of one 
or more actions of n so that this delay solves the conflict. 
This has an impact in /3j (7r) since any delay in the execu¬ 
tion of 7r diminishes the value of its original benefit. 

• agent i (agent j, respectively) considers to switch to an¬ 
other plan in 11 , (IT, , respectively) which does not cause 
any conflict with the plan tt' (i r, respectively). 

Agents wish to choose their maximum benefit plan but 
then the choices of the other agents can affect each other’s 
benefits. This is the reason we propose a game-theoretic ap¬ 
proach to solve this problem. 

A plan tt is defined as a sequence of non-temporal actions 
tt = [ai, ...,a m 0 Assuming t = 0 is the start time when 
the agents begin the execution of one of their plans, the ex¬ 
ecution of 7T would ideally take m units of time, executing 
a\ at time t = 0 and the rest of actions at consecutive time 
instants, thus finishing the execution of 7r at time t = m — 1 
(last action is scheduled at m — 1). This is called the earli¬ 
est plan execution as it denotes that the start time and finish 
time of the execution 7r are scheduled at the earliest possi¬ 
ble times. However, if conflicts between 7r and the plans of 
other agents arise, then the actions of tt might likely not to 
be realized at their earliest times, in which case a tentative 
solution could be to delay the execution of some action in 7r 
so as to avoid the conflict. Therefore, given a plan tt, we can 
find infinite schedules for the execution of tt. 

Definition 1 Given a plan tt = [ai,..., a m \, 'I-L is an in¬ 
finite set that contains all possible schedules for tt. Par¬ 
ticularly, we define as ipo the earliest plan execution of tt 
that finishes at time m — 1. Given two different schedules 
ipj, Tpj+i £ the finish time of ipj is prior or equal to the 
finish time ofipj+i. 

Let ipj , where j fi 0, be a schedule for 7r that finishes at 
time t > m — 1. The net benefit that the agent obtains with 
7 jjj diminishes with respect to fii{Tt). The loss of benefit is 
a consequence of the delayed execution of tt and this delay 
may affect agents differently. For instance, if for agent i the 
delay of ipj wrt to ipo has a low incidence in fifitr), then 
i might still wish to execute ipj. However, for a different 
agent k, a particular schedule of a plan tt' £ 11 /,, may have a 
great impact in /?fe(7r 7 ) even resulting in a negative net ben¬ 
efit. How delays affect the benefit of the agents depends on 
the intrinsic characteristics of the agents. 

Definition 2 We define a utility function p : T —> R. that 
returns the net value of a plan schedule. Thus, pfiipj), ipj £ 

1 In this first approach, we consider only instantaneous actions 










'I'tt, is the utility that agent i receives from executing the 
schedule tjjj for plan n. By default, for any given plan tt and 

i/j 0 £ pi(f> o) = 

A rational way of solving the conflicts of interest that arise 
among a set of self-interested agents who all wish to execute 
their maximum benefit plan comes from the non-cooperative 
game theory. Therefore, our general game is modeled as a 
non-cooperative game in the Normal-Form. The agents are 
the players of the game; the set of actions Ai is modeled as 
the game actions (plans) available to agent i, and the payoff 
function is defined as the result of a rational selection of a 
plan schedule for each agent. Formally: 

Definition 3 We define our general game as a tuple 
(N, P, p), where: 

• N = {1,... ,n} is the set ofn self-interested players. 

• P = Pi x ... x P n , where Pi = H, V?’ £ N. Each agent i 
has a finite set of strategies which are the plans contained 
in 1 We will then call a plan profile the n-tuple p = 
(pi,P 2 , ■ ■ ■ ,p n ), where pi £ Hi for each agent i. 

• p = (pi,..., p n ) where pi : P —» ffi. is a real-valued payoff 
function for agent i. pt ( p ) is defined as the utility of the 
schedule of plan pi when pi is executed simultaneously 
with (pi,... ,Pi_l,Pi + l, ■ ■ ■ ,p n )- 

The plan profile p represents the plan choice of each 
agent. Every agent i wishes to execute the schedule p 0 £ 
T ' Pi . Since this may not be feasible, agents have to agree 
on a joint plan schedule. We define a procedure named 
joint plan schedule that receives as input a plan profile p 
and returns a schedule profile s = (si, s 2 , ■ ■ ■, s n ), where 
V* £ N,Si £ Tp,. The schedule profile s is a consistent 
joint plan schedule; i.e., all of the individual plan schedules 
in s can be simultaneously executed without provoking any 
conflict. The joint plan schedule procedure, whose details 
are given in the next section, defines our internal game. 

Let p = (pi,P 2 , ■ ■ ■ ,Pn) be a plan profile and s = 
(si, s 2 ) • • •, s n ) the schedule profile for p. Then, we have 
that pi(p) = Pi(si). 

The game returns a scheduled plan profile that is a Nash 
Equilibrium (NE) solution. This represents a stable solu¬ 
tion from which no agent benefits from invalidating another 
agent’s plan schedule. 

The joint plan schedule game 

This section describes the internal game. The problem con¬ 
sists in finding a feasible joint plan schedule for a given plan 
profile p = (pi,p 2 , ■ ■ ■ ,p n ), where each agent i wishes to 
execute its plan pi under the earliest plan schedule pipo). 
Since potential conflicts between the actions of the plans of 
different agents may prevent some of them from executing 
tfo, agents get engaged in a game in order to come up with a 
rational decision that maximizes their expected utility. 

For a particular plan tt, an action a £ n is given by the 
triple a = ( pre(a ), add{a ), del(a)), where pre(a) is the set 
of conditions that must hold in a state S for the action to 
be applicable, add(a) is its add list, and del (a) is its delete 
list, each a set of literals. Let a and a' be two actions, both 
scheduled at time t, in the plans of two different agents; a 


conflict between a and a' occurs at t, if the two actions are 
mutually exclusive (mutex) at t (Blum and Furst 1997). 

The joint plan schedule game is actually the result of sim¬ 
ulating the execution of all the agents’ plans. At each time 
t, every agent i makes a move, which consists in executing 
the next action a in its plan pi or executing the empty action 
(_L). The empty action is the default mechanism to avoid 
two actions that are mutex at t, and this implies a deferral in 
the execution of a. A concept similar to _L, called the empty 
sequence, is used in ( Larbi, Konieczny, and Marquis 2007) 
as a neutral element for calculating the permutations of the 
plans of two agents, although the particular implication of 
this empty sequence in the plan or in the evaluation of the 
satisfaction profiles is not described. 


Search space of the internal game 

Several issues must be considered when creating the search 
space of the internal game: 

1) Simultaneous and sequential execution of the game. 

The internal game is essentially a multi-round sequential 
game since the simulation of the plans execution occurs 
along time, one action of each player at a time. Then, the 
execution at time t + 1 only takes place when every agent 
has moved at time t, so that players observe the choices 
of the rest of agents at t. In contrast, the game at time t 
represents the simultaneous moves of the agents at that 
time. Simultaneous moves can always be rephrased as se¬ 
quential moves with imperfect information, in which case 
agents would likely get ’stuck’ if their actions are mu¬ 
tex; that is, agents would not have the possibility of co¬ 
ordinating their actions. Therefore, simultaneous moves 
at t are also simulated as sequential moves as if agents 
would know the intention of the other agents. In essence, 
this can be interpreted as agents analyzing the possibilities 
of avoiding the conflict and then playing simultaneously 
the choice that reports a stable solution. Obviously, this 
means that agents would know the strategies of the others 
at time t, what seems reasonable if they are all interested 
in maximizing their utility. 

2) Applicability of the actions. Unlike other games where 
the agents’ strategies are always applicable , in planning 
it may happen that an action a of a plan is not executable 
at time t in the state resulting from the execution of the 
t — 1 previous steps. In such a case, the schedule profile is 
discarded. In our model, a schedule profile s is a solution 
if s comprises a plan schedule for every agent. Otherwise, 
we would be considering coalitions of agents that discard 
strategies that do not fit with the strategies of the coalition 
members. On the other hand, _L is only applicable at t, if at 
least any other agent applies a non-empty action at t. The 
empty action is also applicable when the agent has played 
all the actions of its plan. 

Example. Consider a plan profile p = (pi,p 2 ) of two 
agents, where p-\ = [ai,a 2 ,a 3 ] and Vi = [h,b 2 ,b 3 \. 
s = (si,s 2 ) with si = (ai, _L, J_, a 2 , a 3 ) and s 2 = 
(_L, b\, 6 2 , & 3 , _L) is a valid joint schedule if all the actions 
scheduled at each time t are not mutex. 







Definition 4 Given a plan profile p = (pi,P 2 , ■ ■ ■ ,Pn), s = 
(si, S 2 , ■ ■ ■, s n ) is a valid schedule profile to p if every Si 
is a non-empty plan schedule and the actions of every p t 
scheduled at each time t are not mutex. 

Following, we formally define our internal game. 

Definition 5 A perfect-information extensive-form game 
consists of: 

• a set of players, N = {1,... ,n} 

• a finite set X of nodes that form the tree, with S C X 
being the terminal nodes 

• a set of functions that describe each x fL S: 

- the player z( x) who moves at x 

- the set A(x) of possible actions at x 

- the successor node n(x, a) resulting from action a 

• n payoff functions that assign a payoff to each player as 
a function of the terminal node reached 

Let pi = [ai,..., a m \ be the plan of agent i. The set A{x) 
of possible actions of i at x is A(x) = {a, _L}, where a is the 
action of pi that has to be executed next, which comes de¬ 
termined by the evolution of the game so far. Only in the 
case that agent i has already played the m actions of p,, 
A(x) = {_L}. As commented above, each agent makes a 
move at a time so the first n levels of the tree represent the 
moves of the n agents at time t, the next n levels represent 
the moves of the n agents at t + 1 and so on. 

A node x of the game tree represents the planning state 
after executing the path from the root node until x. For each 
node x, there are at most two successor nodes, each corre¬ 
sponding to the application of the actions in A(x). A termi¬ 
nal node s denotes a valid schedule profile. 

Let s = (si, S 2 , ■ ■ ■, s n ) be a terminal node; the payoff of 
player i at s is given by pfisf). Note that the solution of the 
internal game for a plan profile p = (pi,P 2 , ■ ■ ■ ,Pn ) is one 
of the terminal nodes of the game tree, and the payoff for 
each player i represents the value of p, (p). Then, the payoff 
vector of the solution terminal node is the payoff vector of 
one of the cells in the general game. 


Subgame Perfect Equilibrium (SPE) 

The solution concept we apply in our internal game is the 


Subgame Perfect Equilibrium (Shoham and Leyton-Brown 
20091 Chapter 5), a concept that refines a NE in perfect 


information extensive-form games by eliminating those un¬ 
wanted Nash Equilibra. The SPE of a game are all strategy 
profiles that are NE for any subgame. By definition, every 
SPE is also a NE, but not every NE is SPE. The SPE elimi¬ 
nates the so-called “noncredible threats”, that is, those situa¬ 
tions in which an agent i threatens the other agents to choose 
a node that is harmful for all of them, with the intention of 
forcing the other players to change their decisions, thus al¬ 
lowing i to reach a more profitable node. However, this type 
of threats are non credible because a self-interested agent 
would not jeopardize its utility. 

A common method to find a SPE in a finite perfect- 
information extensive-form game is the backward induction 
algorithm. This algorithm has the advantage that it can be 


computed in linear time in the size of the game tree, in con¬ 
trast to the best known methods to find NE that require time 
exponential in the size of the normal-form. In addition, it 
can be implemented as a single depth-first traversal of the 
game tree. We consider the SPE as the most adequate so¬ 
lution concept for our joint plan schedule game since SPE 
reflects the strategic behavior of a self-interested agent tak¬ 
ing into account the decision of the rest of agents to reach 
the most preferable solution in a common environment. 

The SPE solution concept provides us a strong argument 
to solve the problem of selecting a joint plan schedule as a 
perfect-information extensive-form game instead of using, 
for example, a planner that returns all possible combinations 
of the agents’ plans. In this latter case, the question would 
be which policy to apply to choose one schedule over the 
other. We could apply criteria such as Pareto-opti trial it\0 or 
the maximum social welfare^ However, a Pareto-dominant 
solution does not always exist in all problems and the high¬ 
est social welfare solution may be different from the SPE 
solution. That is, neither of these solution concepts would 
actually reflect how the fate of one agent is impacted by the 
actions of others. 

The SPE solution concept has also some limitations. First, 
there could exist multiple SPE in a game, in which case 
one SPE may be chosen randomly. Second, the order of the 
agents when building the tree is relevant for the game in 
some situations. Consider, for instance, the case of a two- 
agent game. The application of the backward induction al¬ 
gorithm would give some advantage to the first agent in 
those cases for which there exist two different schedules to 
avoid the mutex (delaying one agent’s action over the other 
or viceversa). In this case, the first agent will then select the 
solution that does not delay its conflicting action. Notice that 
in these situations both solutions are SPE and thus equally 
good from a game-theoretic perspective. Any other conflict¬ 
solving mechanism would also favour one agent over the 
other one depending on the used criteria; for instance, a plan¬ 
ner would favour the agent whose delay returns the short¬ 
est makespan solution, and a more social-oriented approach 
would give advantage to the agent whose delay minimizes 
the overall welfare. In order to alleviate the impact of the 
order of the agents in the SPE solution, agents are randomly 
chosen in the tree generation. 

An example of an extensive-form tree for a particular joint 
plan schedule problem can be seen in Figure[l] The tree rep¬ 
resents the internal game of two agents A and B with plans 
7 Ta = [01,02] and 7 tb = [hi, 62]. The letter above an action 
represent its precondition, the letter below represents its ef¬ 
fects. Thus, p £ pre(a 2 ), p £ pre(b 1 ), £ del[b\) and 

p £ add{bf). At each non-terminal node, the corresponding 
agent generates its successors; in case of a non-applicable 
action, the branch is pruned. For example, in node 2 agent 
A tries to put its action 02, but this is not possible because 
in that state a previous action bi deleted p. Another exam- 


2 A vector Pareto-dominates another one if each of the compo¬ 
nents of the first one is greater or equal to the corresponding com¬ 
ponent in the second one. 

'The sum of all agents’ utility 








jsl (9,10) 


Figure 1: Tree example 


pie of non applicable action is shown in the right branch of 
node 6. In this case, agent B tries to apply the empty action 
_L, but this option is also discarded because agent A has also 
applied an empty action in the same time step (t = 0). 

In the tree example of Figure |T| we assume that both 
agents A and B have the same utility function, that a de¬ 
lay means a penalty proportional to the utility, and that 
P(tva) = P(tvb) = 10. If we apply the backward induc¬ 
tion algorithm to the this extensive-form game, it returns the 
joint schedule profile jsl, or its equivalent jsl. This sched¬ 
ule profile reports the highest possible utility for agent B, 
and a penalty of one unit (generic penalty) for agent A. Let’s 
see how the backward induction algorithm obtains the SPE 
in this example. The payoffs of jsl are back up to node 2, 
where they will be compared with the values of node 5. The 
joint schedule js2 is backed up to node 5 because agent A 
is who chooses at node 5. Then, in node 1 agent B chooses 
between node 2 and node 5 and hence, jsl is chosen. In the 
other branches, in node 8 jsl will prevail over js5 and then, 
when compared in node 7 with js6, the choice of agent A 
is jsl. This results in agent A choosing at node 0 between 
jsl and js4, both with the same payoffs, and so both are 
equivalent SPE solutions. If the tree is developed following 
a different agent order the SPE solution will be the same. 

Experimental results 

In this section, we present some experimental results in or¬ 
der to validate and discuss our approach. As several factors 
can affect the solutions of the general game, we show differ¬ 
ent examples of game situations. 

We implemented a program to generation the extensive- 
form tree and apply the backward induction algorithm. The 
NE in the normal-form game is computed with the tool 
Gambit ( [McKelvey, McLennan, and Turocy 2014| >. 

For the experiments we used problems of the well-known 
Zeno-Travel domain from the International Planning Com¬ 
petition (IPC-3^] However, for simplicity and the sake of 
clarity, we show generic actions in the figures. 


4 http://ipc.icaps-conference.org/ 


The experiments were carried out for two agents, A and B. 
Both agents have a set of individual plans that solve one or 
more goals. The more goals achieved by a plan, the more the 
benefit of the plan. In addition, the benefit of a plan depends 
on the makespan of such plan. Given a plan tv, which earli¬ 
est plan execution is denoted by ip 0 , fa (tv) is calculated as 
follows: j3i(iv) = nGoals(Tv) * 10 — makespan(ip o), where 
nGoals( tv) represents the number of goals solved by tv and 
makespan(ipo) represents the minimum duration schedule 
for tv. 

The utility of a particular schedule ip £ 'f- is a func¬ 
tion of Pi(n) and the number of time units that the actions 
of tv are delayed in ip with respect to the earliest plan ex¬ 
ecution ip 0 - in other words, the difference in the makespan 
of pi and ip 0 . Thus, Pi(ip) = Pi(n) if ip = ip 0 . Otherwise, 
Pi(ip) = Pi (tv) — delay (ip), where delay (ip) is the delay in 
the makespan of ip with respect to the makespan of ipo- 



Table [T| shows the problems used in these experiments: 
the set of initial plans of each agent, the number of actions 
of each plan and its utility. 



7TB1 

7TB2 

7TS3 

7TA1 

15,16(2,2) 

17,7 (0,2) 

17,9 (0,0) 

TVA2 

8,16 (0,2) 

8,7 (0,2) 

7,9 (1,0) 

TV A3 

7,18 (2,0) 

9,9 (0,0) 

8,9 (1,0) 


Table 2: Problem 1 


In Table [2] we can see the results of the general game for 
problem 1. Each cell is the result of a joint plan schedule 
game that combines a plan of agent A and a plan of agent 
B. In each cell, we show the payoff of tvax and tvb v as well 
as the values of delay(ip) for each plan (delay values are 
shown between parenthesis). The values in each cell are the 
result of the schedule profile returned by the internal game. 

The NE of this problem is the combination of tvai and 
7 Tbi, with an utility of (15,16) for agent A and B, respec¬ 
tively. Agent A uses the plan that solves its goals gi and (j 2 
delayed two time steps. Agent B uses the plan that solves 
its goals g\ and r/ 2 , also delayed two time steps. The solu¬ 
tion for both agents is to use the plan that solves more goals 
(with a higher initial benefit) a bit delayed. This can be a 




































typical situation if there are not many conflicts and if the de¬ 
lay is not very punishing to the agents. The schedule of this 
solution is shown in Figure [2] We can see in the figure that 
agent A starts the execution of its plan ttai at t = 0, but after 
having scheduled its first two actions, the strategy of agent 
A introduces a delay of two time steps (empty actions) until 
it can finally execute its final action without causing a mu¬ 
tex with the actions of agent B. Regarding agent B, its first 
action in ttbi is delayed two time units to avoid the conflict 
with agent A. In this example, both agents have a conflict 
with each other (both have an action which deletes a condi¬ 
tion that the other agent needs). 


0 12 3 4 

p q 



-■p 


Figure 2: Schedule example 

Table [3] represents the game in normal-form of problem 2 
shown in Table [T] In this case, we find three different equi¬ 
libria: ( 7 ^ 1 , 7 Tb 2 ) with payoffs (15,14) and delays (3,2) for 
agent A and B, respectively; another NE is (tta 2 , Tbi). with 
payoffs (14,15) and a delay of (2,3) time steps, respectively; 
the last NE is a mixed strategy with probabilities 0.001 and 
0.999 for ttai and tta 2 of agent A, and probabilities 0.001 
and 0.999 for strategies ttbi and ttb 2 of agent B. In this 
problem we have a cell with — oo as payoff of the two agents. 
This payoff represents that there does not exist a valid joint 
schedule for the plans due to an unsolvable conflict as the 
one shown in Figure [3] 



ttbi 

■7TB 2 

TTB 3 

7T.B4 

TTAI 

—oo,—oo 

15,14 (3,2) 

18,7 (0,2) 

17,9(1,0) 

™A2 

14,15(2,3) 

14,14(2,2) 

16,6 (0,3) 

16,9 (0,0) 

7TA3 

8,16(0,2) 

8,16(0,0) 

8,7 (0,2) 

8,8 (0,1) 

7r A4 

7,18(2,0) 

6,16(3,0) 

9.9 (0,0j 

8,9 (1,0) 


Table 3: Problem 2 


The game in Table [4]is the same game as the one in Table 
[3] but, in this case, the agents suffer a delay penalty of 3.5 
(instead of 1) per each action delayed in their plan sched¬ 
ules. Under this new evaluation, we can see how this affects 
the general game. In this situation, the only NE solution is 
(ka 2 , TB 2 ) with utility values (9,9) and a delay of two time 


p 



- , q 

n " 


■p 

Figure 3: Unsolvable conflict 


steps for each agent. Note that this solution is neither Pareto- 
optimal (solution (16,9) is Pareto-optimal) nor it maximizes 
the social welfare. However, these two solution concepts can 
be applied in case of multiple NE. 



7TB1 

7VB2 

TTB 3 

7TB 4 

TTAI 

— oo,—OO 

7.5.9 (3,2) 

18,2 (0,2) 

14.5,9(1,0) 

7T A2 

9.7.5 (2.3) 

9,9 (2,2) 

16,-1.5 (0,3) 

16,9 (0,0) 

KA 3 

8,11 (0,2) 

8,16(0,0) 

8,2 (0,2) 

8,5.5 (0,1) 

7T A 4 

2,18 (2,0) 

-1.5,16(3,0) 

9,9 (0,0) 

5.5,9 (1,0) 


Table 4: Problem 2b, more delay penalty to the utility 


In conclusion, our approach simulates how agents behave 
with several strategies and it returns an equilibrium solution 
that is stable for all of the agents. All agents participate in 
the schedule profile solution and their utilities are dependent 
on the strategies of the other agents regarding the conflicts 
that appear in the problem. 

Conclusions and future work 

In this paper, we have presented a complete game-theoretic 
approximation for non-cooperative agents. The strategies of 
the agents are determined by the different ways of solving 
mutex actions at a time instant and the loss of utility of the 
solutions in the plan schedules. We also present some exper¬ 
iments carried out in a particular planning domain. The re¬ 
sults show that the SPE solution of the extensive-form game 
in combination with the NE of the general game return a sta¬ 
ble solution that responds to the strategic behavior of all of 
the agents. 

As for future work, we intend to explore two different 
lines of investigation. The exponential cost of this approach 
represents a major limitation for being used as a general 
MAP method for self-interested agents. Our combination 
of a general+internal game can be successively applied in 
subproblems of the agents. Considering that this approach 
solves a subset of goals of an agent, the agent could get en¬ 
gaged in a new game to solve the rest of his goals, and like¬ 
wise for the rest of agents. Then, a MAP problem can be 
viewed as solving a subset of goals in each repetition of the 
whole game. In this line, the utility functions of the agents 
can be modeled not only to consider the benefit of the current 
schedule profile but also to predict the impact of this strategy 
profile in the resolution of the future goals. That is, we can 
define payoffs as a combination of the utility gained in the 
current game plus an estimate of how the joint plan schedule 
would impact in the resolution of the remaining goals. 

Another line of investigation is to extend this approach to 
cooperative games, allowing the formation of coalitions of 
agents if the coalition represents a more advantageous strat¬ 
egy than playing alone. 
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